A New approach for Classification of Highly Imbalanced Datasets using Evolutionary Algorithms

نویسندگان

  • Satyam Maheshwari
  • Sanjeev Sharma
چکیده

Today’s most of the research interest is in the application of evolutionary algorithms. One of the examples is classification rules in imbalanced domains. The problem of Imbalanced data sets plays a major challenge in data mining community. In imbalanced data sets, the number of instances of one class is much higher than the others, and the class of fewer representatives is of more interest from the point of the learning task. Traditional Machine Learning algorithms work well with balanced data sets, but not able to deal with classification of imbalanced data sets. In the present paper we use different operators of Genetic Algorithms (GA) for over-sampling to enlarge the ratio of positive samples, and then apply clustering to the over-sampled training dataset as a data cleaning method for both classes, removing the redundant or noisy samples. The proposed approach was experimentally analyzed and the experimental results shows an improvement in the classification measured as the area under the receiver operating characteristics (ROC) curve.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...

متن کامل

ارائه‌روش جدید مبتنی‌بر برنامه‌نویسی ژنتیک برای وزن‌دهی قوانین فازی در طبقه‌بندی نامتوازن

In classification problems, we often encounter datasets with different percentage of patterns (i.e. classes with a high pattern percentage and classes with a low pattern percentage). These problems are called “classification Problems with imbalanced data-sets”. Fuzzy rule based classification systems are the most popular fuzzy modeling systems used in pattern classification problems. Rule weights...

متن کامل

Spectral-spatial classification of hyperspectral images by combining hierarchical and marker-based Minimum Spanning Forest algorithms

Many researches have demonstrated that the spatial information can play an important role in the classification of hyperspectral imagery. This study proposes a modified spectral–spatial classification approach for improving the spectral–spatial classification of hyperspectral images. In the proposed method ten spatial/texture features, using mean, standard deviation, contrast, homogeneity, corr...

متن کامل

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...

متن کامل

An Evolutionary Multi-objective Discretization based on Normalized Cut

Learning models and related results depend on the quality of the input data. If raw data is not properly cleaned and structured, the results are tending to be incorrect. Therefore, discretization as one of the preprocessing techniques plays an important role in learning processes. The most important challenge in the discretization process is to reduce the number of features’ values. This operat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011